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Abstract 

An adversary who has obtained the cryptographic hash of a user’s password can mount an offline 
attack to crack the password by comparing this hash value with the cryptographic hashes of likely 
password guesses. This offline attacker is limited only by the resources he is willing to invest to crack 
the password. Key-stretching techniques like hash iteration and memory hard functions have been 
proposed to mitigate the threat of offline attacks by making each password guess more expensive for 
the adversary to verify. However, these techniques also increase costs for a legitimate authentication 
server. We introduce a novel Stackelberg game model which captures the essential elements of this 
interaction between a defender and an offline attacker. In the game the defender first commits to 
a key-stretching mechanism, and the offline attacker responds in a manner that optimizes his utility 
(expected reward minus expected guessing costs). We then introduce Cost Asymmetric Secure Hash 
(CASH), a randomized key-stretching mechanism that minimizes the fraction of passwords that would be 
cracked by a rational offline attacker without increasing amortized authentication costs for the legitimate 
authentication server. CASH is motivated by the observation that the legitimate authentication server 
will typically run the authentication procedure to verify a correct password, while an offline adversary will 
typically use incorrect password guesses. By using randomization we can ensure that the amortized cost 
of running CASH to verify a correct password guess is significantly smaller than the cost of rejecting an 
incorrect password. Using our Stackelberg game framework we can quantify the quality of the underlying 
CASH running time distribution in terms of the fraction of passwords that a rational offline adversary 
would crack. We provide an efficient algorithm to compute high quality CASH distributions for the 
defender. Finally, we analyze CASH using empirical data from two large scale password frequency 
datasets. Our analysis shows that CASH can significantly reduce (up to 50%) the fraction of password 
cracked by a rational offline adversary. 


1 Introduction 


In recent years the authentication servers at major companies like eBay, Zappos, Sony, Linkedin and 
Adobe [sjl^ have been breached. These breaches have resulted in the release of the cryptographic hashes of 
millions of user passwords, each of which has significant economic value to adversaries 36 60 . An adversary 


who has obtained the cryptographic hash of a user’s password can mount a fully automated attack to crack 


the user’s password by comparing this hash value to the cryptographic hashes of likely password guesses 31 


This offline attacker can try as many password guesses as he likes; he is only limited by the resources that 
he is willing to invest to crack the password. 

Offline attacks are becoming increasingly dangerous due to a combination of several different factors. 
First, improvements in computing hardware make password cracking cheaper (e.g., [60| ). Second, empirical 
data indicates that many users tend to select low entropy passwords 20 32,56 . Finally, offline adversaries 


now have a wealth of training data available from previous password breaches 37 so the adversary often 


has very accurate background knowledge about the structure of popular passwords. 
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Password hash functions like PBKDF2 
stretching 


48 


43 , BCRYPT 54 , Argon2 12 and SCRYPT 51 


employ key- 

to make it more expensive for an offline adversary to crack a hashed password. While 
key-stretching may reduce the number of password guesses that the adversary is able to try, the legiti¬ 
mate authentication server faces a basic trade-off: he must also pay an increased cost every time a user 
authenticates. 

The basic observation behind our work is that it is possible for the legitimate authentication server to 
use randomization to gain an advantage in this cat-and-mouse game. The offline adversary will spend most 
of his time guessing incorrect passwords, while the authentication server will primarily authenticate users 
with correct passwords. Therefore, it would be desirable to have an authentication procedure whose cost 
is asymmetric. That is the cost of rejecting an incorrect password is greater than the cost of accepting a 
correct password. This same basic observation lay behind Manber’s proposal to use secret salt values (e.g., 
“pepper”) [46| . For example, the server might store the cryptographic hash 'H.{pwd, t) for a uniformly random 
value t G {1, • • ■ ,m} called the “pepper”. An offline adversary will need to compute the hash function m 
times in total to reject an incorrect password pwd', while the legitimate authentication server will only need 
to compute it times on average to accept a correct password. 

We introduce Cost Asymmetric Secure Hash (CASH) a mechanism for protecting passwords against offline 
attacks while minimizing expected costs to the legitimate authentication server. CASH may be viewed as 
a simple, yet powerful, extension of 


46 in which the distribution over t is not-necessarily uniform — the 


“peppering” idea of Manber 46 is a special case of our mechanism in which the distribution over t is uniform. 

In this paper we seek to address the following questions: How can we quantify the security gains (losses) 
from the use of secret salt values? What distribution over the secret salt value (t) is optimal for the 
authentication server? Is there an efficient algorithm to compute this distribution? Does CASH perform 
better than “pepper” or deterministic key stretching? 


Contributions We first introduce a Stackelberg (leader-follower) game which captures the essential aspects 
of our password setting. Our Stackelberg model can provide helpful guidance for the authentication server 
by predicting whether or not (a particular level of) key-stretching will significantly reduce the number of 
passwords that would be cracked by a rational offline adversary in the event of a server breach. In our 
Stackelberg game the authentication server (leader) first commits to a password hashing strategy, and the 
offline adversary (follower) gets to play his best response to the server’s (leader’s) action. That is the 
adversary selects a threshold B and begins guessing passwords until he either 1) cracks the user’s password, 
or 2) gives up after expending B units of work. The adversary will select a threshold B that maximizes his 
utility^ 

Next we give an efficient algorithm for computing good strategies for the leader (authentication server) in 
this Stackelberg game. The defender wants to find a distribution pi > ... > Pm > 0 over the secret running 
time parameter t G {1,... ,m}, which minimizes the number of passwords that an offline adversary would 
crack. When choosing this distribution, the defender is given a constraint (e.g., E[t] = t' Pt < Cmax) 

bounding the server’s amortized authentication costs. 

Unfortunately, there are no known polynomial time algorithms to compute the Stackelberg equilibrium of 
our game as this problem reduces to a non-convex optimization problemj^ However, we develop an efficient 
algorithm to solve a closely related goal: find the CASH distribution which minimizes the success rate of an 
adversary with a fixed budget B per user. While this new goal is not equivalent to the Stackelberg equilibrium 
our experimental results indicate that the resulting CASH distributions translate to good strategies in 
the original Stackelberg game. At a technical level we show that this new optimization problem can be 
expressed as a linear program. The key technical challenge in solving this linear program is that it has 
exponentially many constraints. Fortunately, this linear program can still be solved in polynomial time 
using an efficient separation oracle that we develop. We also develop a practical algorithm which can quickly 

^Intuitively, the adversary’s utility is his expected reward (the value of a cracked password times the probability he cracks 
it) minus his expected guessing costs (given by the expected number of times that the adversary needs to evaluate the hash 
function before he succeeds or gives up). 

^By contrast, fixing any CASH distribution Pi > .. • > pm it is easy to compute the adversary’s best response. 
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find the (approximately) optimal CASH distribution against a budget B adversary. The algorithm is efficient 
enough to run on large real world instances (e.g., a dataset of 70 million passwords). 

Finally, we evaluated CASH using password frequency data from the Rock You password breach and from 
a (perturbed) dataset of 70 million Yahoo! passwords 


16 20 


Our analysis shows that CASH significantly 

outperforms the traditional (deterministic) key-stretching defense as well as the “peppering’ 


defense of 46 


In some instances, CASH reduced the fraction of passwords cracked by a rational adversary by about 50% 
in comparison to both pepper and traditional key-stretching algorithms. 


2 Background 

Before we introduce the basic CASH mechanism it is necessary to introduce some notation (Section |2.1[ ) and 
review the traditional password based authentication process (Section |2.2[ ). 

2.1 Notation. 

We use H to denote a cryptographic hash function and we let Cost (H) denote the cost of evaluating H one 
time. To simplify the presentation we will assume that all other costs have been scaled so that Cost (H) = 1. 
We use H'"' to denote a hash function that is fc-times as expensive to compute]^ We use V to denote the 
space of passwords that users may select, and we use n to denote the number of passwords in this space. 
We use Pi to denote the probability that a random user selects the password pwdi G V. For notational 
convenience, we assume that the passwords have been sorted so that pi > ... > Pn. Given a set S we will 
$ 

write a; ■<— 5" to denote a uniformly random sample from the set S. 

Table contains a summary of the notation used throughout this paper. Some of this notation will be 
introduced later in the paper when it is first used. 

2.2 Traditional Password Authentication. 

We begin by giving a brief overview of the traditional password authentication process. Suppose that a user 
registers for an account with username u and password pwdu G V. Typically, an authentication server will 
store a record like the following (u, s„, fc, {pwdu, Su)) ■ Here, Su ^ {0,1}-^ is a random L-bit salt value 
used to prevent rainbow table attacks and the parameter k controls the cost of the hash function. We 
stress that the salt value Su and the cost parameter k are stored on the server in the clear so an adversary who 
breaches the authentication server will learn both of these values. We use the notation to emphasize that 
this salt value is different for each user u. The parameter k is selected subject to the constraint that k < Cmax 
— the maximum amortized cost that the authentication server is willing to incur for authentication!^ 

When the user authenticates he will type in his username u and a password guess pwd'u G V. The 
authentication server first finds the record (u, s„, fc, (precis, s„)). It then computes {pwd'u, Su) 
verifies that it matches the stored hash value {pwdu, Su). Note that authentication will always be suc¬ 
cessful when the user’s password is correct (e.g., pwd'u = pwdu) because the hash values {pwd'u, ^u) and 
H'" {pwdu, Su) must match in this case. Similarly, if the user’s password is incorrect (e.g., pwdu ^ P'f^d'^) then 

^In this work we will not focus on the lower level issue of which key-stretching techniques are used. However, this is an 
important research ar ea [l] and we woul d st rongly advocate for the use of modern key-stretching techniques like memory hard 
functions. BCRYPT [54] and PBKDF2 [43] , use hash iteration for key-stretching. In this case the cost parameter k specifies 
the number of hash iterations. For example, if = 2 the authentication server would store the tuple (fc = 2, H(H(pir?d))). The 
disadvantage to this approach is that a hash function H might cost orders of magnitude less to evaluate on an Application 
Specific Integrated Circuit than it would cost to evaluate on a more traditional architecture. By contrast, memory costs 
tend to be relatively stable across different architectures [33] , which motivates the use of memory hard functions for password 
hashing |50| . Argon2 [12] , winner of the recently completed password hashing competition and SCRYPT |51| use memory 
hard functions to perform key-stretching. In this paper we will simply use is fc-times as expensive to compute without 
worrying about the specific key-stretching techniques that were employed to achieve this property. 

^In the traditional (deterministic) key-stretching setting it is clear the hash cost parameter fc = Cmax is equivalent to the 
maximum authentication cost parameter Cmax • However, this equivalence will not hold one we introduce a randomized running 
time parameter t. Thus, it is helpful to use separate notation to separate these distinct parameters. 
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authentication will fail with high probability because the cryptographic hash function H is collision resistant. 


Server Cost. Under this traditional password mechanism the cost of verifying/rejecting a password is 
simply k. The authentication server can increase guessing costs for an offline adversary by increasing A:, but 
in doing so the authentication server will increase its own authentication costs proportionally. 
Authentication Time Increase. By increasing the cost parameter k the authentication server might 
potentially increase delay times for the user — especially if key-stretching is performed on a sequential 

$7 X 10“^® for the SHA-256 hash function 


computer. Bonneau and Schechter 


22 


estimated that Cost (H) 

based on observations of the Bitcoin network. A modern CPU can evaluate SHA-256 around 10^ times per 
second so an authentication server who uses hash iteration for key-stretching would need to select k < 10^ if 
he wants to ensure that user delay is at most one second. In this case we would seem to have an upperbound 
Cost (H'=) < $7 X 10 ® on the cost of a hash function that can be evaluated in 1 second. Fortunately, 
this bound only applies to naive hash iteratiorj^ More effective key-stretching techniques could be used 
to increase Cost (H^) by several orders of magnitude (e.g., Cost (H^) > $10“®) without imposing longer 
authentication delays on the user (even if key-stretching is performed on a sequential computer). For example, 
the SCRYPT and Argon2 hash functions were intentionally designed to use a larger amount of 
memory so that it is not possible to (significantly) reduce hashing costs by developing customized hardware. 
Additionally, Argon2 12 , winner of the password hashing competition, has an optional parameter that 


would allow the authentication server to exploit parallelism to further reduce the amount of time necessary 
to perform key-stretching. 


2.3 Adversary Model 

We consider an untargeted offline attacker whose goal is to break as many passwords as possible. An offline 
attacker has breached the authentication server and has access to all of the data stored on the server. 
In the traditional authentication setting an offline adversary learns the tuple (u, s^, fc, H* (pwdu, s„)) for 
each user u. The adversary will also learn the hash function H since the code to compute H is present 
on the authentication server. We assume that the adversary only uses H in a blackbox manner (e.g., the 
adversary can query H as a random oracle, but he cannot invert H). In general we assume the adversary 
will obtain the source code for any other procedures that are used during the authentication process. While 
the authentication server can limit the number of guesses that an online adversary can make (e.g., by locking 
the adversary out after three incorrect guesses), the authentication server cannot directly limit the number 
of guesses that an offline attacker can try. An offline attacker is limited only by the resources that s/he is 
willing to invest trying to crack the user’s password. 

We assume that the adversary has a value Vu for cracking user rt’s password. An untargeted offline 
attacker has the same value Vu = v for every user u. Symantec recently reported that passwords sell for 
between $4 and $30 on the black market 36 so we might reasonably estimate that v € [$4, $30] 

We also assume that the adversary knows the empirical password distribution pi > ... > Pn over user 
selected passwords as well as the corresponding passwords pwdi,... ,pwdn. Thus, the adversary knows that 
a random user will select pwdi with probability pi, but the adversary does not know which users selected 
pwdi. 

The adversary will select a threshold B and check (up to) B passwords. In this case the fraction of 
passwords that the offline adversary will break is at most Equality holds when the offline adversary 

adopts his optimal guessing strategy and checks the B most likely passwords pwdi, ■ ■ ■ ,pwdB- In this case 


®As we previously noted hash iteration alone is not a particularly effective key-stretching technique. The cost of computing 
SHA-256 can be reduced by a factor of about 1 million on customize hardware — e.g., see https://bitcoimnagazine. liberty. 
me/bitmain-announces-launcb-of-next-generation-antminer-sT-bitcoin-miner/ (Retrieved 5/4/2016). Furthermore, we 
note that modern Bitcoin miners already use Application Specific Integrated Circuits to compute SHA-256 so the upper bound 
from [22| implicitly incorporates this dramatic cost reduction. By contrast, the adversary cannot (significantly) reduce the cost 
of evaluating a memory hard function by developing customized hardware. 

®However, this estimate of the adversary’s value could be too high because it does not account for the inherent risk of getting 
caught when selling/using the password 
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the adversary’s utility would be 


B / B n 

'^ADvi.B,v,k) y] B-p, 

i=l \ i=l i=_B+l 

The first term is the adversary’s expected reward. The last term is the adversary’s expected guessing costQ 
Let B* = i?**’* = argmaxBUj 4 ^y {B,v,k) denote the adversary’s utility optimizing strategy. Then the 
fraction of passwords cracked by a rational adversary will be 

B* 

T^A%V,v,k = '^Pi ■ ( 1 ) 

i=l 



3 CASH Mechanism 

In this section we introduce the basic CASH mechanism, while deferring until later the question of how to 
optimize the parameters of the mechanism. 

3.1 CASH Authentication. 

Observe that in traditional password authentication the costs of verifying and rejecting a password guess 
are symmetric. The goal of CASH is to redesign the authentication mechanism so that these costs are not 
symmetric. In particular, we want to ensure that the cost of rejecting an incorrect password is greater 
than the cost of accepting a correct password. This is a desirable property because most of the adversary’s 
password guesses during an offline attack will be incorrect. By contrast, the authentication server will spend 
most of its effort authenticating legitimate users. 

3.1.1 Creating an Account 

Suppose that a user u registers for an account with the password pwdu G V. In CASH authentication the 
authentication server stores the value (m, s„,fc, {pwdu, Su,tu)) ■ As before Su is a random salt value and 
k is the number of hash iterations. The key difference is that we select a random value tu from the range 
{I,...,m}. We stress that the value tu is not stored on the authentication server (unlike the salt value 
Su)- Thus, the value tu will not be available to an adversary who breaches the server. The account creation 
process is formally presented in Algorithm We use the notation tu here to emphasize that this value is 
chosen independently for each user u. Intuitively, the parameter tu specifies the number of times that the 
authentication server needs to compute when verifying a correct password guess using CASH. 

3.1.2 Authentication 

When the user u tries to authenticate using the password guess pwd'u the authentication server first locates 
the record (u, Su, k,!!^ {pwdu, Su,tu)) ■ The authentication server then computes {pwd'u, Su,t) for each 
value t G {!,..., m}. Authentication is successful if the hashes match for any value t G {!,..., m}. This 
is guaranteed to happen after tu steps whenever the user’s password is correct {pwd'u = pwdu), and this is 
highly unlikely whenever the user’s password is incorrect. The authentication process is formally presented 
in Algorithm 

^Note that for i < B the adversary finishes early after only i guesses if and only if the user selected password pwdi (probability 
Pi). If the user selected password pwdi with i > B then the adversary will quit after B guesses. 
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Table 1: Notation 


Term 

Explanation 

V 

space of passwords 

n 

number of passwords in V 

pwdi 

the Fth most likely password in V 

Pr 

probability that a random user selects pwdi 

m 

the number of evaluations of necessary to reject an incorrect pass¬ 

word using CASH 

tG 

hidden running time parameter which specifies the running time of 
CASH when verifying an correct password, t is randomly selected during 
account creation. 

P 

a distribution over the hidden running time parameter t 

Pj 

the probability that the running time parameter is t = j 

TTi 

the probability of the i’th most likely tuple {pwd,t) 

a 

probability of seeing a correct password in a random authentication 
session 

H 

a cryptographic hash function with Cost (H) = 1 

Rfc 

a cryptographic hash function with Cost (H^) = k 

CsRV,a 

7nk{l — a)-\-ak ^'Pti the amortized cost of a random authentication 

session. 

Craax 

the maximum (amortized) cost that the authentication server is willing 
to incur per authentication 

V 

adversary’s true value for a cracked password 

V 

the authentication server’s estimate for v 

-nCASH 
' ADV,v,v,C 

the fraction of passwords cracked by a rational value v adversary, when 
the authentication server optimizes the CASH distribution p under the 
belief v subject to the cost constraint CsRv,a < Cmax- 

qypepper 

' ADV,v,C 

the fraction of passwords cracked by a rational value v adversary, when 
the authentication server uses the uniform distribution pi = 1/m. The 
hash cost parameter k is now tuned subject to the cost constraint 

CsRV,a < C. 

■jydet 

’ ADV,v,C 

the fraction of passwords cracked by a rational value v adversary when 
the authentication server uses deterministic key-stretching techniques. 
The hash cost parameter is set to fc = C so that the servers cost is C 
for each authentication session. 


3.1.3 CASH Notation 

We use Pi to denote the probability that we set tu = i during the account creation process. For notational 
convenience we will assume that these values are sorted so that pi >■■■> Pm- We will use t ^ p to denote a 
random sample from {1,..., m} in which Pr(<_p [t = i] = pi- For now we assume that the CASH distribution 
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p is given to us. In later sections we will discuss how to select a good distribution p. 


Algorithm 1 CASH 

Create Account 

Input: u, pwdu, P = 

{pi,...,Pm), k, L 

1; 

Su ^ {0,1}^ 


2; 

tu^ P 


3; 

h ^ {pwdu, Su 

5 ^u) 

4; 

StoreRecord {u, 

1 


Algorithm 2 CASH:Authenticate 

Input: u, pwdu 
1; R TryFindRecord (u) 

2: if i? = 0 then 

3; return “Username Not Found.” 

4: end if 

5: (u, Su, k,h) R 

6: for t = 1,..., m do 

7 ; ht {pwdu,Su,t) 

8 : ii ht = h then 

9; return “Authentication Successful” 
10; end if 

11; end for 

12; return “Authentication Failed” 


3.2 Cost to Server 

The cost of rejecting an incorrect password guess is m - k because the server must evaluate {pwdu, Su,tu) 
for all m possible values of G m}. However, whenever a password guess is correct the authentication 

server can halt computation as soon as it finds a match, which will happen after iterations. Here, we 
assume that the authentication server will minimize its amortized cost by trying the most likely values of 
tu first. If we let a denote the probability that the user enters his password correctly during a random 
authentication session then the amortized cost of the authentication server is 

m 

CsRV,a = {1 — a)k ■ m + a ■ k ^ i ■ Pi ■ 

i=l 

In general, we will assume that the server has a maximum amortized cost Cmax that it is willing to incur for 
authenticationj^ Thus, the authentication server must pick the distribution p subject to the cost constraint 

^SRV.ol ^ Cmax- 


3.3 Adversary Response 

Fixing the CASH distribution p induces a distribution over pairs {pwd, t) S Ux {1,..., m}, namely Pr[(pu;d, t)] 
Pi -pt- Once the adversary selects a threshold B the adversary’s optimal strategy is to try the B most likely 
pairs. In this case the adversary’s utility will be 

B B mn 

{B,v)=v'^TT,-k'^i-Tri-k B-tt, , (2) 

2^1 2^1 2 ^ B +1 

^For example, Cmax might be (approximately) given by the maximum computational load that the authentication server(s) 
can handle divided by the maximum (anticipated) number of users authenticating at any given point in time. 
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where the terms tti > ... > Tr^n denote the probabilities of each pair {pwd, t) G V x {1,, m} (in sorted 
order). In general, the distribution p that the authentication server selects may depend on the maximum 
(amortized) server cost Cmax as well as our belief v about the adversary’s value for a cracked password. 
Once V and Cmax (and thus pi,. ■ ■ iPm; k and tti, ..., TTmn) have been fixed we can let B* = _ 

arg maxB {B,v) denote the adversary’s utility optimizing response. Then the fraction of passwords 

cracked by a rational adversary will be 


B* 


TjCASH 
' ADV,v,v,C^a 



TTi . 


i=l 


(3) 


Similarly, we will use to denote the fraction of passwords cracked by a rational adversary when p 

is the uniform distributionj^ In this case the hash cost parameter k is tuned to to ensure that CsRv,a < Cmax 
— this can be achieved when 





)t)m +1 




(4) 


3.3.1 Example Distribution 

One simple, yet elegant, way to achieve the goal of cost asymmetry is to set pj = ^ for each j G 
{!,..., m} 1^. We will sometimes call this solution uniform-CASH in this paper because it is a special case 
of the CASH mechanism. The amortized cost of verifying a correct password guess with uniform-CASH is 
CsRVA = k By contrast, the cost of rejecting an incorrect password guess is fc • m — approximately 

twice the cost of verifying a correct password guess. 


Examples with Analysis The above mechanism can already be used to significantly reduce the fraction 
of user passwords that would be cracked in an offline attack. We demonstrate the potential power of CASH 
with two (simplistic) examples. To keep the examples simple we will assume that that users never forget 
or mistype their passwords (i.e., a = 1). In the first example, every user selects one of two passwords 
(e.g., pwdi =“123456” and pri;d 2 =“iloveyou”) with probability p\ = 2/3 and p 2 = 1/3 respectively, and the 
untargeted adversary has a value of u = ^Cmax + e, just slightly more than Cmax — the amortized cost 
incurred by the authentication server during an authentication session. 


• (Deterministic Key-Stretching) The defender sets the hash cost parameter k = Cmax and stores the 
deterministic hash value H^. It is easy to check that the adversary’s optimal response is to choose 
the maximum threshold B* = 2. In this case the adversary cracks the password with probability 

'Tydet _ 1 

' ADV,V,Cmax ~ 


• (Uniform CASH) The defender sets Pi = ^ for each i and he selects cost parameter k = 2-Cmax/{'m+l) 
to ensure that Csrv.i < Cmax — see eq|^. It is not too difficult to see that the adversary’s optimal 
response is to choose the threshold B* = 0 (i.e., give up without guessing)Thus, 'P^E^v ~ 

This first example illustrates the potential advantage of randomization. The next example illustrates 
the potential advantage of non-uniform distributions. Example 2 is the same as example 1 except that we 
increase the adversary’s value to u = ^Cmax- 

• (Deterministic Key-Stretching) Increasing v can only increase Vj^Evv Cma^' 'Pj^Evv ~ 

• (Uniform CASH) Now the adversary’s optimal strategy is to choose the maximum threshold B* = 2m 
(i.e., keep guessing until he finds the password). Thus, V^Evv ~ 

^Note that q does not depend on -0, our belief about the adversary’s value, because the choice of p (and k) is 

independent of this belief. 

^®In particular, if the adversary instead sets B* = 2m (i.e., keep guessing until he succeeds) then his expected guessing costs 
will be Plfc (^ryi) + (1 - Pl)fc (m + =yi) = (1 - Pl)fcm + = Cmax + i 







• (non-uniform CASH) Suppose that the authentication server, knowing that v = v = gg^g 

m = 5, k = CmaxI'i and sets pi = 9/16,p 2 = P3 = P4 = 1/8 and ps = 1/1613 In this case it is 
possible to verify that the adversary’s optimal response is to set B* = 2 meaning that the adversary 
will try guessing the two most likely pairs {pwdi,t = 1) and {pwd 2 ,t = 1) before giving up. Thus, 
= (Pi +P2)P~1 = ^ < 1- 

Admittedly these example are both overly simplistic. However, we will later consider several empirical 
password distributions and demonstrate that non-uniform CASH distributions are often significantly better 
than both uniform CASH and deterministic key-stretching. 


4 Stackelberg Model 


In the last section we observed that uniform CASH can reduce the adversary’s success rate compared to 
deterministic key-stretching techniques with comparable costs. We also saw that sometimes it is possible to 
do even better than uniform CASH by selecting a non-uniform distribution over This observation leads 
us to ask the following question: What distribution over t leads to the optimal security results? 

In this section we first formalize the problem of finding the optimal CASH distribution parameters pi > 
■ ■ • ^ Pm > 0. Intuitively, we can view this problem as the problem of computing the Stackelberg equilibria 
of a certain game between the authentication server and an untargeted offline adversary. Stackelberg games 
and their applications have been an active area of research in the last decade (e.g., 40 Ml). For now 


we will simply focus on formulating this goal as an optimization problem. In later sections we will present a 
polynomial time algorithm to good solutions to this optimization problem (Sections]^ and 5.2) and we will 
evaluate this algorithm on empirical password datasets (Section]^. 

Before the Stackelberg game begins the adversary is given a value v for cracked passwords and the 
authentication server is given an honest estimate v = v oi the adversary’s value{3 The authentication server 
is also given a bound Cmax on the expected cost of an authentication round. 


Defender Action The authentication server (leader) moves first in our Stackelberg game. The authen¬ 
tication server must commit to a CASH distribution p and a hash cost parameter k. The values must be 
selected subject to a constraint on the maximum amortized cost for the authentication server 

m 

CsRV,a = {1 - a)m ■ k + a ■ k'^{i ■ Pi) < Cmax ■ 

i=l 

Intuitively, we can view the value a as being given by nature and the parameter Cmax is given by the 
computational resources of the authentication server. 


Offline Adversary After the authentication server commits to p and k the offline adversary is given access 
to all of the hashed passwords stored on the authentication server. The adversary can try guesses of the form 
{pwdi,j)- This particular guess is correct if and only if the user u selected password pwdu = pwdi and we 
selected the secret salt value = j. For an untargeted attacker the probability that this guess is correct is 
Pi ■ pj. We can describe the action of a rational adversary using a threshold B which denotes the maximum 
number of pairs {pwd, t) that he will check (equivalently the maximum number of times he will compute 
H*’). Intuitively, we don’t need to specify which pairs the adversary guesses because a rational adversary 
will always check the B most likely pairs. 

is easy to verify that CsRV.a = 2k = Cmax- 

course in some cases the uniform distribution might still be optimal. 

'^^In the game the authentication server will assume that u is indeed the correct value when he computes the distribution p. 
Of course, in our empirical analysis we will also be interested in exploring how CASH performs when this estimate is incorrect 

V ^ V. 
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We remark that we assume that an offline attacker will be able to obtain the CASH parameters ,..., pm 
and k that we select]^ The adversary also knows the empirical password distribution pi > ■ ■ ■ > Pn and the 
associated passwords pwdi,... ,pwdn- 

Optimization Goal Informally, the defender’s goal is to minimize the probability that the rational adver¬ 
sary succeeds in cracking each user’s password. The distribution that achieves this goal is the Stackelberg 
equilibrium of our game. Formally, our optimization goal is presented as Optimization Goal[2 We are given 
as input the empirical password distribution pi,..., as well as the value v for the adversary, a maximum 
cost Cmax for the authentication server, the CASH parameter m and the fraction a of authentication ses¬ 
sions in which enter their correct password. We want to find values pi, ■ ■ ■ ^Pm and k that minimize the 
fraction of cracked passwords q subject to several constraints. Constraints 1 and 2 ensure that 

Pi,. ■. ,Pm form a valid probability distribution, and constraint 3 ensures that the amortized cost of authen¬ 
tication is at most Cmax ■ Constraint 4 simply defines the variables tti ,..., Tr^n where tt^ is the probability 
of the i’th most likely tuple {pwd,t). Constraints 5 implies that B* is the adversary’s optimal response 
(e.g., {B*,v) > {B,v) for any other threshold B that the adversary might choose). Finally, 

^f=i minimization goal, is the fraction of passwords cracked under the adversary’s utility optimizing 

response B*. 


Optimization Goal 1: Minimize Adversary Success Rate 
Input Parameters: pi,... ,p„, v, Cmax, rn and a. 

Variables: pi,. ■ ■ ,Pm,'^i •>■•••) '^nm 1 ^ 
minimize subject to 

(1) 1 > Pi > ... > Pm >0, 

(2) Er=iK = i, 

(3) (1 - a) ink + ak (* ’ Pi) ^ C'max, 

(4) TTl , . . . , TT^nn = Sort (pi . Pi,... ,p„ • Pm), and 

(5) VB € {0,1,..., mn} we have 


U 


CASH 

ADV 


{B*,v) > U 


CASH 

ADV 


{B,v) . 


Unfortunately, Optimization Goal is inherently non-convex due to the combination of constraints 4 and 
5|3 Thus, it is not clear whether or not there is a polynomial time algorithm to compute the Stackelberg 
equilibria. However, as we will see in the next section, there is a polynomial time algorithm to solve a very 
closely related goal. Minimize the number of passwords that a threshold B adversary can crack (Goal[^. 


5 Algorithms 


In this section we show how the goal of minimizing the success rate of a threshold B adversary can be 
formulated as a linear program with exponentially many constraints (Optimization Goal[^. We also show 
that this linear program can be solved in polynomial time by developing an efficient separation oracle. 
Unfortunately, this polynomial time algorithm is not efficient enough to solve the large real-world instances 
we consider in our experiments in Section [6} However, building on ideas from Section we develop a more 


efficient (in practice) algorithm in Section 5.2 This new algorithm always finds an approximately optimal 


solution to Optimization Goal[^ While we do not have any theoretical guarantees about its running time, we 


offline adversary has already breached authentication server which will contain code to sample tu whenever a new user 
u creates an account. 

Substituting in the formula for {B, v) constraint 5 becomes v i-T^j — k'CCCp | ^ B-iTi < v ’’’i” 

i ■ TTi — k yf I 1 B* ■ TTi, where tt; depends on the Sort operation. 
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found that it converged quickly on every instance we tried. Furthermore, as we will see in our experimental 
evaluation, the algorithm results in significantly improved Stackelberg strategies. 

We remark that our experimental results in Section can be understood without reading this section. 
In particular, it is possible to view the algorithms in Sections and |5.2[ as a blackbox heuristic algorithm 
that finds reasonably good solutions to Optimization Goal[2 A more empirically inclined reader may wish 
to skip to our experimental results in Section after skimming through this section. 

5.1 LP Formulation 

We first show how to state our goal, minimize the number of passwords that a threshold B adversary will 
crack, as a linear program. Our LP uses the following variables 'PAdv,B,Pi, ■ ■ ■ ,Pm- Intuitively, the variable 
'PAdv,B represents the fraction of passwords that a threshold B adversary can crack. At a high level our 
Linear Program can be understood as follows: minimize VAdv,B subject to the requirement that no feasible 
strategy for the threshold B adversary achieves a success rate greater than VAdv,B- This requirement can 
be expressed as a combination of exponentially many linear constraints. Formally, our LP is presented as 
Optimization Goal[^ 


Optimization Goal 2: Minimize Threshold B Adversary Success Rate 
Input Parameters: pi,... ,Pn, B, Cmax, rn, k, a 
Variables: pi,... ,Pm,'PAdv,B 
minimize VAdv,B subject to 

(1) 1 > Pi > ■ • ■ > Pm > 0, 

(2) Er=iK = i, 

(3) (1 -a)mk + ak Y.'JPi (*' Pi) < C'max, 

(4) 0 < VAdv,B < 1, and 

(5) VS* C P X {1,..., to} s.t [S'! = B we have 


'PAdv,B 


> p^'p^ ■ 

iij)es 


The key intuition is that all of the (5) constraints ensure that VAdv,B is at least at big as the best success 
rate for a threshold B adversary. This is true because the optimal guessing strategy for a threshold B 
adversary is to guess the B most likely tuples {pwd, t). Let S' denote these B most-likely tuples then one of 
the type (5) constraints says that VAdv,B > j)eS' Pi 'Pi- Thus, type (5) constraints guarantee we cannot 
‘cheat’ by pretending like the adversary will follow a suboptimal strategy (e.g., spending his guessing budget 
on the least likely passwords) when we solve Optimization Goal 

The key challenge in solving Optimization Goalj^is that there are exponentially many type (5) constraints. 
Our main result in this section states that we can still solve this problem in polynomial time. 

Theorem 1. We can find the solutions to Optimization Goal^^in polynomial time in to, n and L, where L 
is the bit precision of our inputs. 

The proof of Theorem can be found in the appendix. We briefly overview the proof strategy here. The 
key idea is to build a polynomial time separation oracle for Optimization Goal[^ Given a candidate solution 
p the separation oracle should either tell us that the solution is feasible (satisfies all type (5) constraints) 
or it should find an unsatished constraint. We can then use the ellipsoid method [44| with our separation 
oracle to solve to solve the linear program in polynomial time. In appendix [T] we show how to develop a 
polynomial time separation oracle for our linear programs. Intuitively, the separation oracle simply sorts the 
tuples V X {I,..., to} using the associated probabilities PT[{pwdi,t)] = Pi ■ Pf Then we can find the set S' 
of the B most likely tuples and check to see if the constraint VAdv.B > X](z j)gS' P^ ' Pi satished. 


II 


Once we have a polynomial time algorithm to solve Optimization Goal for a fixed value of k we could 
adopt the multiple LP framework of Conitzer and Sandholm 29 to include k as an optimization parameter. 
The idea is simple. Because the range of possible values of k is small (fc < Cmax) we can simply solve 
Optimization Goal separately for each value of k and take the best solution — the one with the smallest 
value of VAdv,B- 


5.2 Practical CASH Optimization 

Theoremj^states that Optimization Goal|^can be solved in polynomial time using the ellipsoid algorithm . 
While this is nice in theory the ellipsoid algorithm is rarely deployed in practice because the running time 
tends to be very large. In this section we develop a heuristic algorithm (Algorithm to solve Goal using 
our separation oracle. While algorithm is guaranteed to always find the (approximately) optimal solution 
to Optimization Goal[^ we do not have any theoretical proof that it will converge to find the optimal solution 
in polynomial time. However, in all of our experiments we found that Algorithm converged reasonably 
quickly. 

The basic idea behind our heuristic algorithm is to start by ignoring all of the type (5) constraints from 
Goal[^ We then run a standard LP solver to find the optimal solution to the resulting LP. Finally, we run 
our separation oracle to determine if this solution violates any type (5) constraints. If it does not then we 
are done. If the separation oracle does find a violated type (5) constraint then we add this constraint to our 
LP and solve again. We repeat this process until we have a solution that satisfies all type (5) constraints. 
Observe that this process must terminate because we will eventually run out of type (5) constraints to add. 
The hope is that our algorithm will converge much more quickly. In practice, we hnd that it does (e.g., at 
most 25 iterations). 

Further Optimizations Our separation oracle runs in time O {mnXogmn) because we sort a list of mn 
tuples {pwd,t). In practice, the number of passwords n might be very large (e.g., the RockYou dataset 
contains n « 14.3 x 10® unique passwords). Fortunately, it is often possible to drastically reduce the time 
and space requirements of our separation oracle by grouping passwords into equivalence classes. In particular, 
we group two passwords pwdi and pwdj into an equivalence class if and only if pi = pj . This approach reduces 
running time of our separation oracle to O [mn' logmn^), where n' is the number of equivalence classej^ For 
example, the RockYou database contains over 10^ unique passwords, but we only get n' = 2040 equivalence 
classes. 

We can represent our empirical distribution over passwords as a sequence of n' pairs (pi, ni),..., (p„', n„/). 
Here, pi denotes the probability of a password in equivalence class i and € N denotes the total number 
of passwords in equivalence class i. We have n-i = n and ' Pi = 1- before we assume that 

Pi > Pi+i- In most password datasets is the number of passwords that were selected by only one user 
(e.g., for the RockYou dataset « 11.9 x 10®). 

We now argue that this change in view does not fundamentally alter our linear program (Optimization 
Goal[^ or our separation oracle. Constraints (I)-(4) in our LP remain unchanged. We need to make a few 
notational changes to type (5) constraints to ensure that VAdv,B is at least as large at the success rate of 
the optimal adversary. We use 



r 


” 1 

J^B = < 

{b^,. 


bi < B f\ Vi.bi < m ■ rii . 

z=l J 


to describe the space of feasible guessing strategies for an adversary with a threshold B. Here, bi denotes 
the total number of times the adversary evaluates H^' while attacking passwords in equivalence class i <n'. 
Thus, the range of 6^ is 0 < < m • because there are rii passwords in the equivalence class to attack and 

he can choose to evaluate up to m times for each password. 

^®To save computation one could also group passwords into equivalence classes with approximately equal probabilities, but 
this representation loses some accuracy and was unnecessary in all of our experiments. 
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Given values pi, ■ ■ ■ ,pm and a feasible allocation bi,... ,bn> € -Fb the probability that adversary will 
crack the password is at most 

( 

I {bi modni)pj-^ 

Intuitively, the optimal adversary will spend equal effort (bi/rii) cracking each password in an equivalence 
class because they all have the same probability. The (bi mod nt) and \ bi/ni\ terms handle the technicality 
that bi may not be divisible by rii. Thus, we can replace our type (5) constraints with the constraint 

\ 

+ ’ 

.^1 ) 


- ( 

VAdv,B > ^i)pryL 

.=1 ^ 



for every {bi,..., 6„/) G Fb- 

Our modified separation oracle works in essentially the same way. We sort the tuples (*,j) using the 
values p{ j = Pi -pj and select the B largest tuples. The only difference is that the adversary is now allowed to 
select the tuple {i,j) up to rii times. In this section we will use SeparationOracle to refer to the modified 
separation oracle, which runs in time O {mu'log mn') using our compact representation of the empirical 
password distribution. 

Our heuristic algorithm (Algorithm takes as input an approximation parameter e. It is allowed to 
output a solution pi,... ,Pm,FAdv,B as long as the solution is within e of optimal — for any other feasible 
solution p[,. ■. jPm, g we have VAdv,B £ s + £• We use Slack to denote a function that computes 
how badly a linear inequality C is violated. For example, if C denotes the inequality x -\- y > 2.5 and we 
have set x' = y' = 1 then Slack (C, x', y') = 0.5 (e.g., if we introduced a slack variable 0 then we would need 
to select z' such that \z'\ = 0.5 to satisfy the inequality x’ + y' -\- z' > 2.5). 
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Algorithm 3 Optimize (p, n, B, Cmax, ct, e, m, S) 

Input: pi, ... ,p„', ni, ... ,n„/, B, Cmax, ot, e, m, S = ■ ■ 

1; bestSolution •<— 0, bestK •<— fcg 
2: best Success Rate •<— 1.0, slack ^ e 
3 : for j = 0,..., r do 
4: A: ^ fcj 

5: C ^ InitialConstraints(C'maa:, a, 

{Initially, C only includes constraints (l)-(4) 
in goal[§- 

6: Goal ^ {minVAdv,B} 

7: Vrbls ^ {VAdv,B,Pl,---,Pm} 


8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 


'P'Adv.B^Pu ■ ■ • ,p'm ^ LPSolve {Goal, Vrbls, C) 

p' ^ i.p'l,---,p'm) 

P^ {Pl,---,Pn') 
fi ^ (ni,... ,n„/) 

Sep^n ^ {p,n,p',B,k,CsRV,a,'PAdv,B) 

C' •(— SeparationOracle (Sepm) 
while Slack > e A (C" “Ok”) do 

CUiC"} 

'P'Adv,ByP' ^ LPSolve (GoaZ, l/rWs, C) 

{P = {Pl,---,P'm)} 

Sepin ^ {p,n,p',B,k,CsRv,c,,V'Adv,B) 

C' SeparationOracle (Sepm) 
end while 


if bestSuccessRate > V', 


— ' Adv,B 


then 


bestSolution ^ pi,... ,pm 
bestSuccessRate <— g 
{bestM, bestK) •(— {mi,ki) 


slack -tr- Slack 
end if 
end for 

return pi,,pm, bestK 


{c'^PT^Adv.B) 


• 7 5 


kr}, 


5.3 Choosing a CASH Distribution 

While Algorithm efficiently solves optimization Goal it may not yield the optimal distribution for 
our original Stackelberg game. In particular, while Algorithm gives the optimal distribution against a 
threshold-i? adversary, the rational adversary might choose to use a different threshold B* B. 

We introduce a heuristic algorithm to find good Stackelberg strategies (CASH distributions) for the 
defender. Algorithm uses Algorithm as a subroutine to search for good CASH distributions. Algorithm 
takes as input an (estimate) v of the adversary’s value and a set B of potential adversary thresholds B 
and runs Algorithmto compute the optimal distribution for each threshold. We then compute the rational 
value V adversary’s best response to each of distributions and find the best distribution for the authentication 
server — the one which results in the lowest fraction of cracked passwords under the corresponding best 
adversary response. Algorithm [^assumes a subroutine RationalAdvSuccess (p, n, v,p, k), which computes 
the fraction of cracked passwords under a value v adverary’s best response to the CASH distribution p with 
empirical password distribution defined by the pair (p, n) and a hash cost parameter k. 
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Algorithm 4 FindCASHDistribution 

Input. , . . . , , 77-1 , 1 ^ 1 Cmax 5 ^ 

e, m, S 

= {ko,ki, 

• ■ •) kr}, B — {Bq, Bi, . 


1 

Pl, . . . ,Pm ^ 1/to 





2 








‘"(1- 

a)m + a 

(-=±1) 


3 

advSuccess ^ c 





4 

for a; = 0 , ..., £ do 





5 

B ^ Bx 





6 

Pa:, kx ^ Optimize (p, n, B, Cmax, a. 

e, m, S) 




7 

CS ^ RationalAdvSuccess (p, n, v 

; Vx ) kx^ 




8 

if CS < advSuccess then 





9 

P^Px 





10 

k i — kx 





11 

advSuccess ^ CS 





12 

end if 





13 

end for 
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return p, k 






We remark that the subroutine RationalAdvSuccess can be computed in time O (n'm log mn') — the 
most expensive step is sorting the mn' pairs {pi,pj) based on the value pi • pj. Once we have these pairs 
in sorted order there is a simple formula for computing the marginal benefit/costs of a larger threshold B. 
See Algorithm [7] in the appendix for more details. 

We remark that Algorithm is not guaranteed to always find the optimal solution to optimization goal 
[2 It may be viewed as a heuristic algorithm that generates many promising candidate CASH distributions 
and then selects the best distribution among them. 


6 Experimental Results 


In this section we empirically demonstrate that our CASH mechanism can be used to significantly reduce 
the fraction of accounts that an offline adversary could compromise. We implemented Algorithm in C# 
using Gurobi as our LP solver, and analyzed CASH using two real-world password distributions pi,... ,p„. 
The first distribution is based on data from the RockYou password breach (32-|- million passwords) and the 
second is based on password frequency data from Yahoo! users (representing « 70 million passwords). The 
later dataset was not the result of a security breach. Instead, Yahoo! gave Bonneau permission to collect 
and analyze password frequency data in a carefully controlled environment. Yahoo! recently allowed Block! 
et al. 16 to use a differentially private 34 algorithm to publish this data. Thus, the password frequency 
data in this data set has been perturbed slightly. Block! et al. 16 also showed that with high probability 
the LI error introduced by their algorithm would be minimal. 

In each of our experiments we fix the password correctness rate a S {1,0.95,0.9} and the maximum 
amortized server cost C^ax before using Algorithmto find a CASH parameters Pi, ■ ■. ,Pm and k subject 
to the appropriate constraints on the amortized server costs. 

We compare the % of cracked passwords under three different scenarios: 


• (Deterministic Key-Stretching) The authentication server selects a hash function with cost param¬ 

eter k = Cmax (achieved through traditional deterministic key-stretching techniques). The rational 
value V adversary will crack each password with probability y. (eq|l|). 

• (Uniform-CASH) The authentication server uses CASH with the uniform distribution. He sets k 

according to eq|^to ensure that his amortized costs are at most Cmax- A rational value v adversary 
will crack each password with probability ' 
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• (CASH) Given an estimate v of the adversary’s budget we used Algorithm to optimize the CASH 
parameters k and pi, ■. ■ ,pm subject to the constraint that the amortized server cost is at most Cmax 
when users enter the wrong password with probability 1 — a. We fixed the parameters m = 50, e = 0.02, 
and we set 13 — {5 * Crnax x 10 , Cmax X 10 , Cmax X 10 ,1.5* Cmax X 10 , 2.0 * Cmax X 10 , 2.5 * Cmax X 10 , 
2.65 • Cmax X 10'^, 2.8 • Cmax X 10^, 3.0 • Cmax X lO'^, 5.0 • Cmax X lO'^, Cmax X 10®}. Thus, Algorithm]^ 
computes the optimal distribution against a threshold B adversary for each B G B, and selects the best 
distribution p against a value v adversary. ^ denote the fraction of cracked passwords 

when the true value is u = {j. When the adversary’s true value is v ^ v, q will denote the 

fraction of cracked passwords. 

Our results indicate that an authentication server could significantly reduce the fraction of compro¬ 
mised passwords in an offline attack by adopting our optimal CASH mechanism instead of deterministic 
key-stretching or uniform-CASH. These results held robustly for both the RockYou and Yahoo! password 
distributions. 


6.0.1 Password Datasets 

We use two password frequency datasets, RockYou and Yahoo!, to analyze our CASH mechanism. The 
RockYou dataset contains passwords from N « 32.6 million RockYou users, and the Yahoo! dataset contains 
data from TV « 70 million Yahoo! users. We used frequency data from each of these datasets to obtain an 
empirical password distribution pi > P 2 > Ps ■ ■ ■ ^ Pn over V. 

The RockYou dataset is based on actual user passwords which were leaked during the infamous RockYou 
security breach (RockYou had been storing these passwords in the clear). The total number of unique 
passwords in the dataset was n « 14.3 million. Approximately, 11.9 million of these passwords were unique 
to one RockYou user. The other « 2.5 million passwords were used by multiple users. The most popular 
password {pwdi = T23456’) was shared by « 0.3 million RockYou users {pi « 0.01). RockYou did not 
impose strict password restrictions on its users (e.g. users were allowed to select passwords consisting of only 
lowercase letters or only numbers). 

We also used (perturbed) password frequency data from a dataset of iV « 70 million Yahoo! passwords. 
See 


20 for more details about how this data was collected and see 16 


frequency data was perturbed to satisfy the rigorous notion of differentially privacy 34 


for more details about how the 
Block! et al. 

proved that with high probability the LI distortion of the perturbed frequency data is bounded by O (^i/N /ej , 
where the privacy parameter was set to e = 0.25 when the Yahoo! dataset was published. Thus, the perturbed 
dataset will also still give us a good estimate of the empirical password distribution. 


6.1 Results 

Our first set of experimental results are shown in Figures and These plots were computed under the 
assumption that a = 1 (users always enter their passwords correctly), and that v = v (the defender knows 
the exact adversary value). The results show that for some (higher) adversary values our non-uniform 
CASH distributions improves significantly on the cost-equivalent versions of uniform CASH (50% reduction 


4a 


in cracked passwords) and deterministic key-stretching (56% reduction in cracked passwords)]^ Figures^ _ 

and[4b|(resp. Figures]^ and [3^ show the same results under the assumptions that a = 0.9 (resp. a = 0.95). 

Figures [^and[^(resp. Figuresand [3e|) explore the effect of a wrong estimate v ^ v oi the adversary’s 
value for both the RockYou and Yahoo! datasets. Despite receiving the wrong estimate v our algorithm 
returns a distribution that is (almost always) slightly better than the corresponding uniform CASH dis¬ 
tribution. Both distributions still significantly outperform the cost equivalent deterministic key-stretching 
solution. 

Figures and in the appendix explore what happens when the defender uses the wrong empirical 
password distribution when searching for a good CASH distribution p (e.g., if the defender optimizes p 


note that we would expect to see relatively high adversary values vIC-n 
typically be quite small (e.g., $10“®). 


in the offline setting because Cmax will 
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a = 1 



Adversary Value: 




Figure 1: Yahoo Dataset: a = 1. 
a = 1 



Adversary Value: 


Cr. 


Figure 2: RockYou Dataset: a = 1. 


under the assumption that the empirical password distribution is given by the Yahoo! dataset when the 
actual distribution is given by the RockYou dataset). Briefly, these plots show that non-uniform CASH 
significantly outperforms deterministic key-stretching even when non-uniform CASH is optimized under the 
wrong distribution and non-uniform CASH slightly outperforms uniform CASH on most, but not all, of the 
curve. 
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a = 0.95 


a = 0.95, « = X 10^ 


a = 0.95, V = 7 X Cmax x 10^ 



Adversary Value: 


Adversary Value: 


Adversary Value: 


(a) Yahoo 


(b) Yahoo; v ^ v 


(c) Yahoo: v ^ v 


a = 0.95 


a = 0.95,« = C,„a. X 10'‘ 


a = 0.95, u = 7 X C^ax x 10^ 



(d) RockYou 


(e) RockYou: v ^ v 


(f) RockYou: v ^ v. 


Figure 3: a = 0.95 


6.2 Discussion 


In our experiments we varied the password correctness rate a G {0.9,0.95,1}. Intuitively, we expect for CASH 
to have a greater advantage over traditional key-stretching techniques when a is larger, but when a —>■ 0 
we should not expect for CASH or uniform-CASH to outperform deterministic key-stretching techniques 
because there is no advantage in making authentication costs asymmetric. It is easier for users to remember 
passwords that they use frequently 17 22 52 so we would expect for a to be larger for services that are 


used frequently (e.g., e-mail). This suggests that larger values of a (e.g., a = 0.9 or a = 0.95) would be 
appropriate for many services because the users who authenticate most frequently would be the least likely 
to enter incorrect passwords. While different authentication servers might experience different failed login 
rates I — a, we remark that it is reasonable to assume that the authentication server knows the value of a 
because it can monitor login attempts. 


Estimating v While our results suggest that CASH continues to perform well even if our estimate v of 
the adversary’s value v for cracked passwords is wrong, we would still recommend that an authentication 
server perform a careful economic analysis to obtain the estimate v before running Algorithm to compute 
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Figure 4: a = 0.9. 


the CASH distribution p. The organization should take into account empirical data on the cost Cost (H) 
of computing the underlying hash function as well as the market value of a cracked password. If possible, 
we recommend that the organization consider data from black market sales of passwords for similar types 
of accounts (e.g., an adversary would likely value a cracked Bank of America password more than a cracked 
Twitter password). Symantec reports that cracked passwords are sold on the black market for $4-$30 36 


Thus, $30/Cost (H) might be a reasonable upper bound on the adversary’s value for a cracked password 
(measured in ^ of computations of H^). We would also strongly advocate for the use of memory hard 


functions instead of hash iteration to increase Cost (H) effectively (see discussion in Section 2.2). 


Obtaining an Empirical Password Distribution We remark that the specific CASH distributions 
we computed for the RockYou and Yahoo! datasets might not be optimal in other application settings 
because the underlying password distribution may vary across different contexts. For example, users might 
be more motivated to pick strong passwords for higher value accounts (e.g., bank accounts). Similarly, 
some organizations choose to restrict the passwords that a user may select (e.g., requiring upper and lower 
case letters). While these restrictions do not always result in stronger passwords [^, they can alter the 
underlying password distribution 18 . While the underlying distribution may vary from context to context, 
we note that an authentication server could always follow the framework of Bonneau 20 and Block! et 
16 to securely approximate the password distribution pi,... of its own users. 


If an organization remains highly uncertain about value u of a cracked password or about the empirical 
password distribution pi,... ,p„ then it may be prudent to adopt the uniform-CASH mechanism (e.g., 
which always performs at least as well as the traditional key-stretching approach. 


46), 


6.2.1 Experimental Limitations 

We remark that values of ^ q that we compute in our experiments may be less realistic for larger 

values of (e.g., 10®). The reason is that pi, our empirical estimate of the probability of password 

pwdi, will be too high for many of our unique passwords in the dataset. For example, consider a dedicated 
user who memorizes a truly random 20 character string of upper and lower case letters. The true probability 
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that any individual password guess matches the user’s password would be at most 1/52^° « 1/ (2.09 x 10^"^). 
However, if that password occurred in the RockYou dataset then our empirical estimate of this probability 
would be at least l/(3.26 x 10^). Developing improved techniques for estimating the true likelihood of 
unique password in a password frequency dataset is an important research direction. 


7 Related Work 


Breaches. Recent breaches [2j|^ highlight the importance of proper password storage. In one of these 
instances passwords were stored on the authentication server in cleartext and in other instances the pass¬ 
words were not salted . Salting is a simple, yet effective, way to defend against rainbow table attacks [9 


49 


which can be used to dramatically reduce the cost of an offline attack against unsalted passwords 
Bonneau and Preibusch [21| found that implementation errors like these are unfortunately commonplace. 


Key Stretching. The practice of key stretching was proposed as early as 1979 by Morris and Thompson 48 


The goal is to make the hash function more expensive to evaluate so that an offline attack is more expensive 

BCRYPT 


for the adversary. PBKDF2 43 


54 use hash iteration to accomplish this goal. The recent Ashley 


Madison breach highlights the benefits of key-stretching in practice. Through an implementation mistake 
half of the Ashley Madison passwords were protected with the MD5 hash function instead of the much 
stronger BCRYPT hash function allowing offline password crackers to quickly recover these passwordf^ 
More modern password hash functions like SCRYPT 51 use memory hard functions for key-stretching. 
Recently, the Password Hashing Competition was developed to encourage the development of alternative 


password hashing schemes (e.g., 10 ^). Argon2 [12| , the winner, has a parameters which control memory 
usage and parallelism. Deterministic key-stretching methods result in proportionally increased costs for 
the legitimate server as well as the adversary. Manber 46 proposed the use of hidden salt values (e.g.. 


‘pepper’) to make it more expensive to reject incorrect passwords. CASH may be viewed as a generalization 
of this idea. Boyen 23 proposed using halting puzzles to introduce an extreme asymmetry — the password 


verification algorithm never halts when we try an incorrect password. However, in practice an authentication 
server will need to upper bound the maximum running time for authentication because even legitimate users 
may occasionally enter the wrong password. 

Other Defenses Against Offline Attacks. If an organization has multiple servers for authentication then 
it is possible to distribute storage of the passwords across multiple servers to keep them safe from an adver¬ 
sary who only breaches one server (e.g., see 25 or 1^). duels and Rivest proposed storing the hashes of 


fake passwords (honeywords) and using a second auxiliary server to detect an offline attack (authentication 
attempt with honeywords). Another line of research has sought to include the solution(s) to hard artificial 
intelligence problems in the password hash so that an offline attacker needs human assistance 28 ^ . 


By contrast, CASH does not require an organization to purchase and maintain multiple (distributed) au¬ 
thentication servers and it could be adopted without altering the user’s authentication experience (e.g., by 
requiring the user to solve CAPTCHAs). 

Measuring Password Strength. Guessing-entropy 47 57 , ^ P*’ measures the average number of 


guesses needed to crack a single password. We use a similar formula to compute how much work a threshold- 
B adversary would do in expectation. Guessing-entropy and Shannon-entropy are known to be poor metrics 
for measuring password strengthj^ While minimum entropy. Hoc = — logpi, can be used to estimate the 
fraction of passwords that could be cracked in an online attack 
security measurement in general. 


18 , it can provide an overly pessimistic 


Boztas 24 proposed a metric called /3-guesswork, which measured the success rate for an adversary with 
/3 guesses per account J2i=iPi- We use a similar formula for computing the success rate of a threshold-i? 
adversary against our CASH mechanism — the key difference is that the adversary must guess the random 
value tu as well as the user’s password pwdu- Pliam’s proposed a similar metric called a-guesswork 53 


'^^See, http://arstechnica.com/S6curity/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/ 
(retrieved 5/4/2016) 

Guessing-entropy could be high even if half of our users choose the same password (pi = 0.5) as long as the other half of 
our users choose a password uniformly at random from V (^p 2 = ... = pn = ■ 
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which measures the number of password guesses the adversary would need (per user) to achieve success rate 


Encouraging Users to Memorize Stronger Passwords. A separate line of research has focused on 
helping users memorize stronger passwords using various mnemonic techniques and/or rehearsal techniques 


(e.g., 14 22,39 ^). 


Password managers seek to minimize user burden by using a single password to generate multiple pass¬ 
words 55 . These password managers often use client-side key stretching to derive each password. While 


CASH is a useful tool for server-side key stretching, our current version of CASH is not appropriate for 
client-side key stretching because the authentication procedure is not deterministic. In subsequent work, 
Blocki and Sridhar 19 developed Client-CASH an extension of CASH suitable for client-side key stretching. 
Password Alternatives. Another line of research has focused on developing alternatives to text passwords 

Herley and van Oorschoot argued that text passwords will remain the 

We note that CASH could be 


like graphical passwords imiMiEi] 
dominant means of authentication despite attempts to replace them 
used to protect graphical passwords as well as text passwords. 
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8 Conclusions 

We presented a novel Stackelberg game model which captures the essential elements of the interaction 
between an authentication server (leader) and an offline password cracker (follower). Our Stackelberg model 
can provide guidance for the authentication server by providing an estimate of how significantly key-stretching 
reduces the number of passwords that would be cracked by a rational offline adversary in the event of a server 
breach. We also introduced, CASH, a randomized secure hashing algorithm that significantly outperforms 
traditional key-stretching defenses in our Stackelberg game. While the problem of computing an exact 
Stackelberg equlibria is non-convex, we were able to find an efficient heuristic algorithm to compute good 
strategies for the authentication server. Our heuristic algorithm is based on a highly related problem that 
we are able to show is tractable. Finally, we analyzed the performance of our CASH mechanism using 
empirical password data from two large scale password frequency datasets: Yahoo! and RockYou. Our 
empirical analysis demonstrates that the CASH mechanism can significantly (e.g., 50%) reduce the fraction 
of passwords that would be cracked in an offline attack by a rational adversary. Thus, our CASH mechanism 
can be used to mitigate the threat of offline attacks without increasing computation costs for a legitimate 
authentication server. 
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Missing Proofs 


Reminder of Theorem]^ We can find the solutions to Optimization Goal^in polynomial time in m, n 
and L, where L is the bit precision of our inputs. 

Proof of Theorem (sketch) We first note that the convex feasible space from Optimization Goal fits 
inside a ball of radius one. Thus, the Ellipsoid algorithm 44 will converge after making poly{m) many 
queries to our separation oracle. By Theorem the running time of the separation oracle is O {mn log mn ). 
Thus, the total running time is polynomial in m and n. □ 


.1 Separation Oracle 

The key idea behind Theorem [l] is to develop a polynomial time separation oracle. A separation oracle is an 
algorithm that takes as input a convex set K C K™ and a point p G K™. The separation oracle outputs “Ok” 
if p S K] otherwise it returns hyperplane separating x from K. In our context, the separation oracle takes as 
input a proposed solution g,pi,...,p'.^ and outputs “Ok” if every constraint from Optimization Goal 
is satisfied; otherwise the separation oracle hnds a constraint that is not satisfied. If we can develop a 
polynomial time separation oracle for our linear program then we can use the ellipsoid algorithm to solve 
our linear program in polynomial time 44 . For our purposes, it is not necessary to understand how the 


ellipsoid algorithm works. Will we treat the ellipsoid algorithm as a blackbox that can solve a linear program 
in polynomial time given oracle access to a separation oracle. 

We now present a separation oracle for Goal[^ Theoremstates that Algorithm!^ is a polynomial time 
separation oracle. We provide intuition for our separation oracle below. Theorem follows immediately 
because we can run the ellipsoid algorithm with our separation oracle to solve Goal in polynomial 
time. 


Theorem 2. 


Algorithm^ is valid separation oracle for GoaZ[^ and runs in time O (run log mn). 
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Algorithm 5 Separation Oracle. Output is an unsatisfied constraint C or “Ok” if every constraint is 
satisfied._ 

Input: pi,... ,p„, Pi,... ,p'„, B, Cmax, k, a, and and 

1: if YJLlP'^ + \ then 

2; return Yh=iPi = ^- 

3: end if 

4: if (1 - a)m ■ k + k ■ a i-p'i> Cmax then 
5: return (1 — a)m ■ k + k ■ a YlT=i C Pi < C^ax 

6: end if 

7; for i=l,... ,m do 
8: if p' < 0 then 

9: return pi > 0. 

10; end if 

11; if i <m and p^_^_l > p'i then 
12; return p^+i < pi. 

13; end if 
14; end for 

15- if T^'Adv,B > 1 then 
16; return < 1 

17; end if 

18^ if 'k”Adv,B < 0 then 
19; return V'^da^B > 0 

20; end if 

21; for i = 1,..., n do 
22; for j = 1,..., m do 
23; p'j^p^p'. 

24; end for 
25; end for 

26; TUPLES ^ {(*, j) |l<*<nAl<j< to}. 

27; Define ordering over TUPLES: {ii,ji) >- (* 202 ) if any of the following conditions hold (1) Pi^j^ > 
or (2) pijjj = pi^j^ and ii < h or (3)pi^j^ = pi^j^ and ii = 12 and ji < j 2 - 
28; SORTED - TUPLES a- SORT {TUPLES, ^). {Let Tk = SORTED - TUPLES[k]. } 

{Tfc is the fc’th biggest element according to ;^} 

29; S^{T^,...,Tb}. 

30; for i=l,... ,n do 

31; b[ •(— maxjj G Z \j = 0 y {i,j) € 5} 

32; end for 

33^ if 'PAdv,B < T,7=iPiT,%iPj then 

34; return VAdv,B > Yh=i Pi Y.%i Pj 
35; else 

36; return “Ok” 

37; end if 


Intuitively, the idea behind the separation oracle is quite simple. Suppose that we want to verify that 
the variable assignment p},... ,Pmi'P'Adv b i® feasible. The first few steps of our separation oracle verify that 
constraints (l)-(4) from Goal are satisfied by the assignment p},...,p^. These straightforward checks 
simply verify that the proposed CASH distribution p},..., p^ is valid and that the server’s amortized costs 
are less than CsRV,a- 

The next step, verifying that all type (5) constraints are satisfied, is a bit more challenging because there 
are exponentially many constraints. Recall that these constraints intuitively ensure that b f® indeed 
an upper bound on the success rate of the optimal adversary given CASH distribution p},... ,Pm- While 
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we don’t have time to check every feasible budget allocation b S J^b for the adversary, it suffices to find the 
adversary’s optimal budget allocation h’ and verify that ^ is an upper bound on the adversary’s success 
rate given allocation b'. 

The adversary gets \_B/k\ total guesses of the form {pwi,j) for each user u. The probability that the guess 
{pwi,j) is correct is simply p'^ = pi -p' — the guess is correct if and only if u selected password pwdu = pwdi 
and we selected CASH running time parameter = j- The adversary’s optimal strategy is simple: try the 
[H/fcJ most likely guesses. Thus, we can quickly find the adversary’s optimal budget allocation b' by 
computing pij for each pair {pwi,j) and sorting these values. This takes time O (nmlognm). 

Reminder of Theorem]^ Algorithm^is valid separation oracle for Goal^and runs in time O (mnlogmn). 


Proof of Theorem (Sketch) The most expensive step in our algorithm is sorting the p( j values. There 
are mn such values so the algorithm takes O (mn log mn) steps. We now argue that our separation oracle 
has correct behavior. 

Suppose first that there is a constraint C from Optimization Goal[2 that is not satisfied by p^,..., p'^ , V'j^bv b ■ 
It is easy to verify that our separation oracle will catch violations ra constraints (l)-(4) so we can assume 

that C be a violated type (5) constraint 'P'adv.b < Pi Pj where ,..., b^) G Tb ■ Let 
denote the budget obtained by sorting the Pi^ values and then greedily selecting a set S' of the largest values 
until the budget expires — we define to be the number of values of the form pij that are selected and 
S' = {{i,j) I * < n A j < &'} It suffices to argue that 

n n 

Pj > E^^E pj 

i—l j — ^ ^=1 

because in this case our algorithm will return the violated constraint 

n K 

'^'adv,b ^ E^'E^J ■ 

i=l j=l 


Let S^ = {{i,j) \ i < n A j < 5p}. We first observe that 


E p'bj^ E 




(ib)es' 


(z.i)GSC 


by construction of S'. Thus, 


= E p'^,j 

i=i iid)eS' 

^ E 

(i.i)esc 
n bf 

= IIp-^p'j 

i=i j=i 

Finally, when the solution p^,... ,p'my'P’ADV b does satisfy all constraints from Optimization Goal our 
algorithm will not find a constraint b'^,... ,b'^ such that 


n 

P’advm < E^*E^-J ■ 


27 



In this case our algorithm will return “Ok” — the desired outcome. 


□ 


Algorithm 6 InitialConstraints {Cmax,c(,k) 

Input: Cmax,a,k 

1 : C^C{j{ET=iP^ = n- 

2: C OlJll > Pm > 0}. 

3: C ^ C'1J{(1 - a)m ■ k + a- kYT=ii ■ Pi < CsRV,a}- 
4: for i=l,... ,m-l do 
5: C ^ C[_\{l>p,>Q}. 

6: C C\^{pi > pi+i}. 

7: end for 

8: C <— OlJll > VAdv,B ^ 0}. 


Algorithm 7 RationalAdvSuccess (p, n, v,p, k) 

Input: pi,... ,p„', ni,... ,n„/, u, p 
1; cur Success ^ 0 
2: curThreshold ^ 0 
3: curUtility ^ 0 
4: bestUtilityFound ^ 0 
5: bestUtility Success ^ 0 
6; T ^ 0 

7; for i = 1,. ■ ■ ,n' do 
8; for j = 1,..., m do 
9; T.Add{pt ■ pj,ni) 

10; end for 
11; end for 

12; Sort (T). {Use first component pi ■ pj for} 

13; { comparison (greatest to least)} 

14; for t G T do 
15; (tt, count) -Ir- t 

16; curThreshold ^ curThreshold + count 
17; curSuccess ^ tt • count 
18; Abenefit ^ v ■ n ■ count 

19; Acost ^ k* [count * (1 — curSuccess) + +-ir-count 

20; curUtility ^ curUtility + Abenefit — Acost 
21; if curUtility > bestUtility Found then 
22; bestUtilityFound curUtility 

23; bestUtility Success ^ curSuccess 

24; end if 
25; end for 

26; return bestUtility Success 


While we do not have a polynomial time algorithm to compute the Stackelberg equilibrium of our game, 
it is always easy for the adversary to compute his best response. 

Theorem 3. Let p = pi > ... > Pn> and ni,...,n„' define a probability distribution over passwords in 
which there are ni passwords that each are chosen with probability pi and let pi > ... > Pm denote any 
CASH distribution. Then for any value v and any hash cost parameter k we can computed the adversary’s 
optimal strategy in time 0(mn'\ogmn'). 
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Proof, (sketch) Algorithmj^computes the adversay’s optimal strategy. The most expensive step is the sorting 
the mn tuples, which takes time 0{mn'logmn'). Thus, Algorithm runs in time in time 0{mn'logmn'). 
Algorithm iterates through the different possible thresholds that a rational adversary might select. The 
variable curUtility keeps track of the utility at each threshold allowing us to remember which threshold was 
optimal. Intuitively, Algorithm will find the best strategy if and only if curUtility is always a correct 
estimate of the adversary’s utility. Clearly, this is true initially (the utility of selecting = 0 is 0). Thus, 
by induction, it suffices to show that the formulas used to compute marginal cost and marginal benefit are 
correct. If the adversary adds all of the tuples {pwd,t) corresponding to {tt, count) to the set of tuples to 
guess then the adversary is increasing the odds that he cracks the password by tt • count because he is adding 
count tuples to his set of guesses and each tuple is correct with probability tt. Thus, his marginal benefit is 
V - TT ■ count. To analyze marginal cost we consider three cases: 1) The correct tuple (pwd*, t*) was already in 
the adversary’s set of tuples to guess. In this case we don’t increase the adversary’s guessing costs because 
he will always quit before he guesses one of the new tuples we added. 2) The correct tuple {pwd*,t*) is not 
already in the adversary’s set of tuples and it is not in the new set of tuples we add. Thus, we increase the 
adversary’s guessing costs by fc * count. 3) The correct tuple {pwd*,t*) is in the new set of tuples we add. 
In this case we increase the adversary’s guessing costs by fc * ( count+i ^ expectation. The probability that 
we are in case 2 is (1 — curSuccess) and the probability that we are in case 3 is tt • count. Thus, 

Acost ^ /c * I count * (1 — curSuccess) + 


TT • counU + TT • count \ 
2 ) 


□ 


Extra Plots 

Figures and explore what happens when the defender uses the wrong empirical password distribution 
when searching for a good CASH distribution p (e.g., if the defender optimizes p under the assumption 
that the empirical password distribution is given by the Yahoo! dataset when the actual distribution is 
given by the RockYou dataset). Once again non-uniform CASH and CASH both significantly outperform 
deterministic key-stretching, an non-uniform CASH outperforms uniform CASH (slightly) over most of the 
curvj^ Interestingly, in one part of the curve in Figure the adversary’s success rate actually drops as v 
increases. This would be impossible if the defender was using the correct empirical password distribution. 
In this case the adversary’s success rate drops when v increases because the defender switches to a better 
CASH distribution p that happens to perform better under the real distribution. 

Figure plots the fraction of cracked passwords against a value v adversary when the defender selects p 
and k under the assumption that the adversary’s value is 5 = 2.9 x Cmax x 10^ (using the empirical password 
distribution from the Yahoo dataset and setting a = 0.95). Figure plots the corresponding cumulative 
cost distribution for the authentication server induced by p, k and a. For comparison, we also include the 
cumulative cost distributions for uniform CASH and deterministic key-stretching under the same maximum 
cost parameter Cmax- 


^®The exception is Figure [^contains a region where uniform-CASH actually outperforms non-uniform CASH (yielding 15% 
reduction in cracked passwords in comparison to non-uniform CASH). 
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Figure 5: Rock You Results (Optimized for Yahoo): a = 0.95. 
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Figure 6: Yahoo Results (Optimized for RockYou): a = 0.95. 
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Figure 7: Yahoo: v ^ v. 
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Figure 8: Yahoo: CASH Cumulative Probability Distribution. 
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